1. Field of the Invention
The present invention is directed toward the field of document classification, and more particularly toward disambiguating or validating categories preliminarily classified for documents.
2. Art Background
Documents, which include books, magazines, journals, etc., are often stored in a single repository, such as a library or an online database. The repository may store a vast number of documents that cover numerous topics. Typically, the documents in the repository are organized so that a user may locate selected documents of interest. To typically locate documents in a repository, the documents are classified. For example, libraries typically use the Dewey decimal system to classify books and other publications into ten major categories, wherein each category is further subdivided by a number.
In general, a document classification system classifies documents into one or more categories. U.S. patent application, Ser. No. 08/520,499, filed Aug. 29, 1995 now pending, entitled "A Virtual Bookshelf", inventor Kelly Wical, describes a document browsing system. The virtual bookshelf described in the above-identified United States Patent Application catalogs documents available to a user based on the themes identified for each document. The themes are identified from terminology used in the document. A classification hierarchy, which includes categories, is used to classify the themes or terms of a document into categories.
In order to classify documents based on document terminology, a general meaning must be ascribed to the terminology. Generally, the meaning ascribed to terminology should be based on the context or use of the term in the document. For example, a document may include the term "bank." Without any information on the contextual use of the term, "bank" may be associated with the category "finance & investment" to connote a financial institution, or "bank" may be associated with the category "bodies of water" (e.g., the bank of a river). Thus, to properly classify a document based on terminology in the document, the proper context of the terminology must be determined. Note that in a document classification system, if a document is misclassified because the wrong category is ascribed to a term, then the document is effectively lost for that term.
As is set forth in detail below, a disambiguation system of the present invention disambiguates or validates categories, which have been preliminarily classified for a term, to provide proper classification for terms.